Goto

Collaborating Authors

 stochastic gradient descent method


Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent

Neural Information Processing Systems

We present and study a distributed optimization algorithm by employing a stochastic dual coordinate ascent method. Stochastic dual coordinate ascent methods enjoy strong theoretical guarantees and often have better performances than stochastic gradient descent methods in optimizing regularized loss minimization problems. It still lacks of efforts in studying them in a distributed framework. We make a progress along the line by presenting a distributed stochastic dual coordinate ascent algorithm in a star network, with an analysis of the tradeoff between computation and communication. We verify our analysis by experiments on real data sets. Moreover, we compare the proposed algorithm with distributed stochastic gradient descent methods and distributed alternating direction methods of multipliers for optimizing SVMs in the same distributed framework, and observe competitive performances.


First and Second Order Approximations to Stochastic Gradient Descent Methods with Momentum Terms

arXiv.org Artificial Intelligence

Stochastic Gradient Descent (SGD) methods see many uses in optimization problems. Modifications to the algorithm, such as momentum-based SGD methods have been known to produce better results in certain cases. Much of this, however, is due to empirical information rather than rigorous proof. While the dynamics of gradient descent methods can be studied through continuous approximations, existing works only cover scenarios with constant learning rates or SGD without momentum terms. We present approximation results under weak assumptions for SGD that allow learning rates and momentum parameters to vary with respect to time.


A Local Convergence Theory for the Stochastic Gradient Descent Method in Non-Convex Optimization With Non-isolated Local Minima

arXiv.org Machine Learning

Loss functions with non-isolated minima have emerged in several machine learning problems, creating a gap between theory and practice. In this paper, we formulate a new type of local convexity condition that is suitable to describe the behavior of loss functions near non-isolated minima. We show that such condition is general enough to encompass many existing conditions. In addition we study the local convergence of the SGD under this mild condition by adopting the notion of stochastic stability. The corresponding concentration inequalities from the convergence analysis help to interpret the empirical observation from some practical training results.


On a Variance Reduction Correction of the Temporal Difference for Policy Evaluation in the Stochastic Continuous Setting

arXiv.org Artificial Intelligence

This paper deals with solving continuous time, state and action optimization problems in stochastic settings, using reinforcement learning algorithms, and considers the policy evaluation process. We prove that standard learning algorithms based on the discretized temporal difference are doomed to fail when the time discretization tends to zero, because of the stochastic part. We propose a variance-reduction correction of the temporal difference, leading to new learning algorithms that are stable with respect to vanishing time steps. This allows us to give theoretical guarantees of convergence of our algorithms to the solutions of continuous stochastic optimization problems.


Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent

Neural Information Processing Systems

We present and study a distributed optimization algorithm by employing a stochastic dual coordinate ascent method. Stochastic dual coordinate ascent methods enjoy strong theoretical guarantees and often have better performances than stochastic gradient descent methods in optimizing regularized loss minimization problems. It still lacks of efforts in studying them in a distributed framework. We make a progress along the line by presenting a distributed stochastic dual coordinate ascent algorithm in a star network, with an analysis of the tradeoff between computation and communication. We verify our analysis by experiments on real data sets. Moreover, we compare the proposed algorithm with distributed stochastic gradient descent methods and distributed alternating direction methods of multipliers for optimizing SVMs in the same distributed framework, and observe competitive performances.


Projected Semi-Stochastic Gradient Descent Method with Mini-Batch Scheme under Weak Strong Convexity Assumption

arXiv.org Machine Learning

We propose a projected semi-stochastic gradient descent method with mini-batch for improving both the theoretical complexity and practical performance of the general stochastic gradient descent method (SGD). We are able to prove linear convergence under weak strong convexity assumption. This requires no strong convexity assumption for minimizing the sum of smooth convex functions subject to a compact polyhedral set, which remains popular across machine learning community. Our PS2GD preserves the low-cost per iteration and high optimization accuracy via stochastic gradient variance-reduced technique, and admits a simple parallel implementation with mini-batches. Moreover, PS2GD is also applicable to dual problem of SVM with hinge loss.